The goal of this project is to forecast product demand for Big Pharma, a large pharmaceutical distribution company in Germany.
The Business Problem
Big Pharma restock their warehouses monthly, but often run into issues with:
Tasks
The goal is to offer Big Pharma a solution to their problem. The proposed solution is a time series forecast of their product demand. we begin with a pilot test to forecast the quantity of products the company should purchase for their warehouses in the coming month.
Data Contains product demand data from October 2020 to July 2021 with the following fields:
Below is a summary of the data.
#> date product_id stock_demand
#> Min. :2020-10-01 N0SI1 : 1482 Min. :-12226.00
#> 1st Qu.:2020-12-10 AL10C : 1269 1st Qu.: 3.00
#> Median :2021-02-23 ET0N1 : 1259 Median : 9.00
#> Mean :2021-02-21 V0EL1 : 1250 Mean : 79.71
#> 3rd Qu.:2021-05-04 1PO0L : 1195 3rd Qu.: 31.00
#> Max. :2021-07-31 0TR2A : 1188 Max. :149004.00
#> (Other):1040932
We have a demand value of -12226. However, since stock
demand is the number of boxes of the product that was purchased
(according to our meta data dictionary) then it should not be
negative.
Turns out there are quite a number of records (6,808)
with stock demand below 0. To handle this, we make the
following assumptions.
Assumption:
In order to accurately forecast the shortages(negative
stock_demand), I converted them to positive values and
model them as real demand values. This will ensure stock in the
warehouse meets customer demands.
Next, there are over 7000 unique products available for analysis and forecasting. For this test case,
stock_demand.This allowed me to focus on the most important products and quickly iterate to generate a working solution.
Plot below shows the time series plot for the products with the
highest demand over the period under consideration, the y-axis records
the logged values of the stock_demand. This makes it easier
to identify seasonal patterns and reduces the variance of the
observations which makes it ideal for modelling. Below is a seasonal
decomposition of the time series plot for one product.
The plot above helps us identify time components we can include in our model to capture potential seasonal occurrences. It’s clear demand is higher during weekdays. On a monthly scale, July seems to have the least stock demand.
Three models were developed for the problem and configured so that the most suitable model will be chosen for each individual product.
Questions
A table with other metrics that can be used to evaluate each models performance for each product is included in the final solution for completeness and transparency.
How would you build a machine learning pipeline for your model? An end-to-end pipeline, from data acquisition and cleaning to modelling, training and deployment will follow almost the same workflow used here. Additional steps would include testing and version controlling.
How would you measure the impact your model has on the company’s operations? Impact of this model on the company operations could be measured by:
These two business metrics will serve as good barometers for how well the solution is performing and whether or not it requires further adjustment.
The solution, a time series forecast 30 days into the future, is deployed on shinyapps.io and can be accessed here.